Experiments with Result Diversity and Entity Ranking: Text, Anchors, Links, and Wikipedia
نویسندگان
چکیده
In this paper, we document our efforts in participating to the TREC 2009 Entity Ranking and Web Tracks. We had multiple aims: For the Web Track’s Adhoc task we experiment with document text and anchor text representation, and the use of the link structure. For the Web Track’s Diversity task we experiment with using a top down sliding window that, given the top ranked documents, chooses as the next ranked document the one that has the most unique terms or links. We test our sliding window method on a standard document text index and an index of propagated anchor texts. We also experiment with extreme query expansions by taking the top n results of the initial ranking as multi-faceted aspects of the topic to construct n relevance models to obtain n sets of results. A final diverse set of results is obtained by merging the n results lists. For the Entity Ranking Track, we also explore the effectiveness of the anchor text representation, look at the co-citation graph, and experiment with using Wikipedia as a pivot. Our main findings can be summarized as follows: Anchor text is very effective for diversity. It gives high early precision and the results cover more relevant sub-topics than the document text index. Our baseline runs have low diversity, which limits the possible impact of the sliding window approach. New link information seems more effective for diversifying text-based search results than the amount of unique terms added by a document. Anchor text is also very effective for entity ranking. Using Wikipedia as a pivot results in a gain of precision, but at the cost of a loss of recall.
منابع مشابه
Result Diversity and Entity Ranking Experiments: Anchors, Links, Text and Wikipedia
In this paper, we document our efforts in participating to the TREC 2009 Entity Ranking and Web Tracks. We had multiple aims: For the Web Track’s Adhoc task we experiment with document text and anchor text representation, and the use of the link structure. For the Web Track’s Diversity task we experiment with using a top down sliding window that, given the top ranked documents, chooses as the n...
متن کاملComparative Evaluation of Link-Based Approaches for Candidate Ranking in Link-to-Wikipedia Systems
In recent years, the task of automatically linking pieces of text (anchors) mentioned in a document to Wikipedia articles that represent the meaning of these anchors has received extensive research attention. Typically, link-to-Wikipedia systems try to find a set of Wikipedia articles that are candidates to represent the meaning of the anchor and, later, rank these candidates to select the most...
متن کاملUsing Wikipedia Categories and Links in Entity Ranking
This paper describes the participation of the INRIA group in the INEX 2007 XML entity ranking and ad hoc tracks. We developed a system for ranking Wikipedia entities in answer to a query. Our approach utilises the known categories, the link structure of Wikipedia, as well as the link co-occurrences with the examples (when provided) to improve the effectiveness of entity ranking. Our experiments...
متن کاملExploiting Locality of Wikipedia Links in Entity Ranking
Information retrieval from web and XML document collections is ever more focused on returning entities instead of web pages or XML elements. There are many research fields involving named entities; one such field is known as entity ranking, where one goal is to rank entities in response to a query supported with a short list of entity examples. In this paper, we describe our approach to ranking...
متن کاملNTHU at NTCIR-10 CrossLink-2: An Approach toward Semantic Features
This paper describes the approaches of NTHU in the NTCIR-10 Cross-Lingual Link Discovery task, also named CrossLink-2. In this task, we aim to discover valuable anchors in Chinese, Japanese or Korean (CJK) articles and to link these anchors to related English Wikipedia pages. To achieve the objective, we do not only depend on Wikipedia’s distinguishing features (e.g. anchor links information an...
متن کامل